Background
This suite of tutorials was developed for a workshop at the 2021 R-Medicine Conference by the Healthy Regions & Policies Lab at the University of Chicago. This workbook is a quick (3-hour) overview of mapping, GIScience, and spatial analysis basics for health professionals. The workbook was compiled by Marynia Kolak, and the overview for each section is led by Susan Paykin in the live version.
Some coding snippets & data examples are from the phenomenal team of the Opioid Environment Toolkit (Moksha Menghaney, Qinyun Lin, Angela Li). The overall approach follows the Center for Spatial Data Science paradigm, led by Luc Anselin & Julia Koschinsky.
Environment Setup
A basic understanding of R is assumed. This workshop requires several packages, which can be installed from CRAN:
For Mac users, check out https://github.com/r-spatial/sf for additional tips if you run into errors when installing the sf package. Using homebrew to install gdal usually fixes any remaining issues.
1 Intro to Spatial Data
In the workshop, we learned about:
- What is Spatial Data?
- What is the
sfframework for R?
To delve in further, let’s see some spatial data in action.
We’ll work with the sf library first.
1.1 Load Spatial Data
First load in the shapefile. Remember, this type of data is actually comprised of multiple files. All need to be present in order to read correctly.
## Reading layer `geo_export_aae47441-adab-4aca-8cb0-2e0c0114096e' from data source
## `/Users/maryniakolak/code/Intro2RSpatialMed/data/geo_export_aae47441-adab-4aca-8cb0-2e0c0114096e.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 801 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -87.94025 ymin: 41.64429 xmax: -87.52366 ymax: 42.02392
## CRS: 4326
1.2 Non-Spatial & Spatial Views
Always inspect data when loading in. First we look at a non-spatial view.
## Simple feature collection with 6 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -87.68822 ymin: 41.72902 xmax: -87.62394 ymax: 41.87455
## CRS: 4326
## commarea commarea_n countyfp10 geoid10 name10 namelsad10
## 1 44 44 031 17031842400 8424 Census Tract 8424
## 2 59 59 031 17031840300 8403 Census Tract 8403
## 3 34 34 031 17031841100 8411 Census Tract 8411
## 4 31 31 031 17031841200 8412 Census Tract 8412
## 5 32 32 031 17031839000 8390 Census Tract 8390
## 6 28 28 031 17031838200 8382 Census Tract 8382
## notes statefp10 tractce10 geometry
## 1 <NA> 17 842400 POLYGON ((-87.62405 41.7302...
## 2 <NA> 17 840300 POLYGON ((-87.68608 41.8229...
## 3 <NA> 17 841100 POLYGON ((-87.62935 41.8528...
## 4 <NA> 17 841200 POLYGON ((-87.68813 41.8556...
## 5 <NA> 17 839000 POLYGON ((-87.63312 41.8744...
## 6 <NA> 17 838200 POLYGON ((-87.66782 41.8741...
Note the last column – this is a spatially enabled column. The data is no longer a ‘shapefile’ but an `sf’ object, comprised of polygons.
We can use a baseR function to view the spatial dimension. The sf framework enables previews of each attribute in our spatial file.

1.3 Spatial Data Structure
Check out the data structure of this file… What object is it?
## Classes 'sf' and 'data.frame': 801 obs. of 10 variables:
## $ commarea : Factor w/ 77 levels "1","10","11",..: 39 55 28 25 26 21 62 49 74 75 ...
## $ commarea_n: num 44 59 34 31 32 28 65 53 76 77 ...
## $ countyfp10: Factor w/ 1 level "031": 1 1 1 1 1 1 1 1 1 1 ...
## $ geoid10 : Factor w/ 801 levels "17031010100",..: 785 767 772 773 756 751 584 513 684 34 ...
## $ name10 : Factor w/ 801 levels "1001","1002",..: 782 764 769 770 753 748 545 443 663 266 ...
## $ namelsad10: Factor w/ 801 levels "Census Tract 1001",..: 782 764 769 770 753 748 545 443 663 266 ...
## $ notes : Factor w/ 7 levels "Half in CA 64 (Midway Airport)",..: NA NA NA NA NA NA NA NA NA NA ...
## $ statefp10 : Factor w/ 1 level "17": 1 1 1 1 1 1 1 1 1 1 ...
## $ tractce10 : Factor w/ 801 levels "010100","010201",..: 785 767 772 773 756 751 584 513 684 34 ...
## $ geometry :sfc_POLYGON of length 801; first list element: List of 1
## ..$ : num [1:243, 1:2] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## ..- attr(*, "class")= chr "XY" "POLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr "commarea" "commarea_n" "countyfp10" "geoid10" ...
Check out the coordinate reference system. What is it? What are the units?
## Coordinate Reference System:
## User input: 4326
## wkt:
## GEOGCS["WGS84(DD)",
## DATUM["WGS84",
## SPHEROID["WGS84",6378137.0,298.257223563]],
## PRIMEM["Greenwich",0.0],
## UNIT["degree",0.017453292519943295],
## AXIS["Geodetic longitude",EAST],
## AXIS["Geodetic latitude",NORTH]]
1.4 Exploring Coordinate Reference Systems
Lets see how switching CRS changes our object. First we’ll try the Mollweide coordinate reference system that does a good job preserving area across the globe.
To transform our CRS, we use the st_transform function. To plot, we use baseR again but with some paremeter updates. Finally, we check out the CRS of our new object. What are the units? Any other details to note? Will this be appropriate for our spatial analysis?
Chi_tracts.moll <- st_transform(Chi_tracts, crs = "+proj=moll")
plot(st_geometry(Chi_tracts.moll), border = "gray", lwd = 2, main = "Mollweide", sub="preserves areas")
## Coordinate Reference System:
## User input: +proj=moll
## wkt:
## PROJCS["unnamed",
## GEOGCS["WGS 84",
## DATUM["unknown",
## SPHEROID["WGS84",6378137,298.257223563]],
## PRIMEM["Greenwich",0],
## UNIT["degree",0.0174532925199433]],
## PROJECTION["Mollweide"],
## PARAMETER["central_meridian",0],
## PARAMETER["false_easting",0],
## PARAMETER["false_northing",0]]
Next, we’ll try the Winkel CRS, which is a compromise projection that facilitates minimal distortion for area, distance, and angles. We use the same approach, recyling the code with new inputs.
Chi_tracts.54019 = st_transform(Chi_tracts, 54019)
plot(st_geometry(Chi_tracts.54019), border = "gray", lwd = 2, main = "Winkel", sub="minimal distortion")
## Coordinate Reference System:
## User input: EPSG:54019
## wkt:
## PROJCS["World_Winkel_II",
## GEOGCS["GCS_WGS_1984",
## DATUM["WGS_1984",
## SPHEROID["WGS_84",6378137.0,298.257223563]],
## PRIMEM["Greenwich",0.0],
## UNIT["Degree",0.0174532925199433]],
## PROJECTION["Winkel_II"],
## PARAMETER["False_Easting",0.0],
## PARAMETER["False_Northing",0.0],
## PARAMETER["Central_Meridian",0.0],
## PARAMETER["Standard_Parallel_1",50.45977625218981],
## UNIT["Meter",1.0],
## AUTHORITY["Esri","54019"]]
We could also try a totally different projection, to see how that changes our spatial object. Let’s use the “Old Hawaiian UTM Zone 4n” projection, with the EPSG identified from an online search. How does this fare?
Chi_tracts.Hawaii = st_transform(Chi_tracts, 102114)
plot(st_geometry(Chi_tracts.Hawaii), border = "gray", lwd = 2, main = "Old Hawaiian UTM Zone 4N", sub="wrong projection!")
Finally.. let’s choose a projection that is focused on Illinois, and uses distance as feet or meters, to make it a bit more accessible for our work. EPSG:3435 is a good fit:
## Coordinate Reference System:
## User input: EPSG:3435
## wkt:
## PROJCS["NAD83 / Illinois East (ftUS)",
## GEOGCS["NAD83",
## DATUM["North_American_Datum_1983",
## SPHEROID["GRS 1980",6378137,298.257222101,
## AUTHORITY["EPSG","7019"]],
## TOWGS84[0,0,0,0,0,0,0],
## AUTHORITY["EPSG","6269"]],
## PRIMEM["Greenwich",0,
## AUTHORITY["EPSG","8901"]],
## UNIT["degree",0.0174532925199433,
## AUTHORITY["EPSG","9122"]],
## AUTHORITY["EPSG","4269"]],
## PROJECTION["Transverse_Mercator"],
## PARAMETER["latitude_of_origin",36.66666666666666],
## PARAMETER["central_meridian",-88.33333333333333],
## PARAMETER["scale_factor",0.999975],
## PARAMETER["false_easting",984250.0000000002],
## PARAMETER["false_northing",0],
## UNIT["US survey foot",0.3048006096012192,
## AUTHORITY["EPSG","9003"]],
## AXIS["X",EAST],
## AXIS["Y",NORTH],
## AUTHORITY["EPSG","3435"]]

1.5 Refine Basic Map
Now we’ll switch to a more extensive cartographic mapping package, tmap. We approach mapping with one layer at a time. Always start with the object you want to map by calling it with the tm_shape function. Then, at least one descriptive/styling function follows. There are hundreds of variations and paramater specifications, so take your time in exploring tmap and the options.
Here we style the tracts with some semi-transparent borders.

Next we fill the tracts with a light gray, and adjust the color and transparency of borders. We also add a scale bar, positioning it to the left and having a thickness of 0.8 units, and turn off the frame.
tm_shape(Chi_tracts) + tm_fill(col = "gray90") + tm_borders(alpha=0.2, col = "gray10") +
tm_scale_bar(position = ("left"), lwd = 0.8) +
tm_layout(frame = F)
Check out https://rdrr.io/cran/tmap/man/tm_polygons.html for more ideas!
1.6 Arrange multiple maps
Sometimes we want to look at multiple maps at once. Write your mapping function to a new variable, and then call that variable in order of desire using the tmap_arrange function. Hint: this is just one of many! ways to map multiples using tmap… see if you can uncover more in the documentation.
tracts.4326 <- tm_shape(Chi_tracts) + tm_fill(col = "gray90") +
tm_layout(frame = F, title = "EPSG 4326")
tracts.54019 <- tm_shape(Chi_tracts.54019) + tm_fill(col = "gray90") + tm_layout(frame = F, title = "EPSG 54019")
tmap_arrange(tracts.4326, tracts.54019)
1.7 Interactive Mode
So far, we’ve been plotting static maps. We can also switch to an interactive map that uses a Leaflet widget by switching the tmap_mode() parameter specification from “plot” to “view.” It’s on “plot” as default.
## tmap mode set to interactive viewing
Map the same map as before, and check out the interaction!
tm_shape(Chi_tracts) + tm_fill(col = "gray90") + tm_borders(alpha=0.2, col = "gray10") +
tm_scale_bar(position = ("left"), lwd = 0.8) +
tm_layout(frame = F)The tracts are not transparent enough, so we update that here. You can also click the box on the left side to try out other basemaps. See if you can find out how to add a basemap to a static/plotted map, using tmap documentation…
We revert back to plot mode for now.
## tmap mode set to plotting
1.8 Overlay Zip Code Boundaries
How do census tract areas correspond to zip codes? While tracts better represent neighborhoods, often times we are stuck with zip code level scale in healh research. Here we’ll make a reference map to highlight tract distribution across each zip code.
First, we read in zip code boundaries. This data was downloaded directly from the City of Chicago Data Portal as a shapefile.
## Reading layer `geo_export_54bc15d8-5ef5-40e4-8f72-bb0c6dbac9a5' from data source
## `/Users/maryniakolak/code/Intro2RSpatialMed/data/geo_export_54bc15d8-5ef5-40e4-8f72-bb0c6dbac9a5.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 61 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## CRS: 4326
Next, we layer the new shape in – on top of the tracts. We use a thicker border, and try out a new color. Experiment!
## FIRST LAYER: CENSUS TRACT BOUNADRIES
tm_shape(Chi_tracts.3435) + tm_fill(col = "gray90") +
tm_borders(alpha=0.2, col = "gray10") +
## SECOND LAYER: ZIP CODE BOUNDARIES WITH LABEL
tm_shape(Chi_Zips) + tm_borders(lwd = 2, col = "#0099CC") +
tm_text("zip", size = 0.7) +
## MORE CARTOGRAPHIC STYLE
tm_scale_bar(position = ("left"), lwd = 0.8) +
tm_layout(frame = F)
More Resources
On spatial data basics & sf:
On projections:
On tmap:
2 Map Neighborhoods
When considering the health of persons, we have to also consider the neighborhood environment. Sometimes this is looking at neighborhood level health outcomes, like premature mortality at the census tract scale, or cumulative COVID rates by zip code. Sometimes we’re interested in neighborhood factors like poverty, access to affordable housing, or distance to nearest health provider, or pollution-emitting facility. These measurements of the “social determinants of health” at the neighborhood scale are increasingly urgent in modern public health thinking, and are thought to drive and/or reinforce racial, social, and spatial inequity
In this module, we’ll learn about the basics of thematic mapping – known as choropleth mapping – to visualize neighborhood level health phenomena. This will allow you to begin the process of exploratory spatial data analysis and hypothesis generation & refinement.
2.1 Clean Attribute Data
Let’s consider COVID-19 cases by zip code in Chicago. We’ll upload and inspect a summary of cases from the Chicago Data Portal first:
## ZIP.Code Week.Number Week.Start Week.End Cases...Weekly
## 1 60603 39 09/20/2020 09/26/2020 0
## 2 60604 39 09/20/2020 09/26/2020 0
## 3 60611 16 04/12/2020 04/18/2020 8
## 4 60611 15 04/05/2020 04/11/2020 7
## 5 60615 11 03/08/2020 03/14/2020 NA
## 6 60603 10 03/01/2020 03/07/2020 NA
## Cases...Cumulative Case.Rate...Weekly Case.Rate...Cumulative
## 1 13 0 1107.3
## 2 31 0 3964.2
## 3 72 25 222.0
## 4 64 22 197.4
## 5 NA NA NA
## 6 NA NA NA
## Tests...Weekly Tests...Cumulative Test.Rate...Weekly
## 1 25 327 2130
## 2 12 339 1534
## 3 101 450 312
## 4 59 349 182
## 5 6 9 14
## 6 0 0 0
## Test.Rate...Cumulative Percent.Tested.Positive...Weekly
## 1 27853.5 0.0
## 2 43350.4 0.0
## 3 1387.8 0.1
## 4 1076.3 0.1
## 5 21.7 NA
## 6 0.0 NA
## Percent.Tested.Positive...Cumulative Deaths...Weekly
## 1 0.0 0
## 2 0.1 0
## 3 0.2 0
## 4 0.2 0
## 5 NA 0
## 6 NA 0
## Deaths...Cumulative Death.Rate...Weekly Death.Rate...Cumulative
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## Population Row.ID ZIP.Code.Location
## 1 1174 60603-39 POINT (-87.625473 41.880112)
## 2 782 60604-39 POINT (-87.629029 41.878153)
## 3 32426 60611-16 POINT (-87.620291 41.894734)
## 4 32426 60611-15 POINT (-87.620291 41.894734)
## 5 41563 60615-11 POINT (-87.602725 41.801993)
## 6 1174 60603-10 POINT (-87.625473 41.880112)
Each row corresponds to a zip code at a different week. This data thus exists as a “long” format, which doesn’t work for spatial analysis. We need to convert to “wide” format, or at the very least, ensure that each zip code corresponds to one row.
To simplify, let’s identify the last week of the dataset, and then subset the data frame to only show that week. We will be interested in the cumulative case rate. Following is one way of doing this – can you think of another way? Try out different approaches of reshaping data to test your R and “tidy” skills.
## [1] 1 31
## ZIP.Code Week.Number Week.Start Week.End Cases...Weekly
## 1 60603 39 09/20/2020 09/26/2020 0
## 2 60604 39 09/20/2020 09/26/2020 0
## 36 60601 39 09/20/2020 09/26/2020 8
## 37 60602 39 09/20/2020 09/26/2020 0
## 41 60605 39 09/20/2020 09/26/2020 12
## 66 60610 39 09/20/2020 09/26/2020 35
## Cases...Cumulative Case.Rate...Weekly Case.Rate...Cumulative
## 1 13 0 1107.3
## 2 31 0 3964.2
## 36 213 54 1451.4
## 37 21 0 1688.1
## 41 391 44 1420.8
## 66 666 90 1706.9
## Tests...Weekly Tests...Cumulative Test.Rate...Weekly
## 1 25 327 2130
## 2 12 339 1534
## 36 202 4304 1376
## 37 27 460 2170
## 41 291 7160 1058
## 66 500 10680 1281
## Test.Rate...Cumulative Percent.Tested.Positive...Weekly
## 1 27853.5 0.0
## 2 43350.4 0.0
## 36 29328.8 0.0
## 37 36977.5 0.0
## 41 26018.4 0.0
## 66 27371.3 0.1
## Percent.Tested.Positive...Cumulative Deaths...Weekly
## 1 0.0 0
## 2 0.1 0
## 36 0.0 1
## 37 0.0 0
## 41 0.1 1
## 66 0.1 0
## Deaths...Cumulative Death.Rate...Weekly Death.Rate...Cumulative
## 1 0 0.0 0.0
## 2 0 0.0 0.0
## 36 6 6.8 40.9
## 37 0 0.0 0.0
## 41 3 3.6 10.9
## 66 10 0.0 25.6
## Population Row.ID ZIP.Code.Location
## 1 1174 60603-39 POINT (-87.625473 41.880112)
## 2 782 60604-39 POINT (-87.629029 41.878153)
## 36 14675 60601-39 POINT (-87.622844 41.886262)
## 37 1244 60602-39 POINT (-87.628309 41.883136)
## 41 27519 60605-39 POINT (-87.623449 41.867824)
## 66 39019 60610-39 POINT (-87.63581 41.90455)
To clean our data a bit, we’ll just keep the zip code name, and cumulative case rate for the week of September 20th, 2020.
## ZIP.Code Case.Rate...Cumulative
## 1 60603 1107.3
## 2 60604 3964.2
## 36 60601 1451.4
## 37 60602 1688.1
## 41 60605 1420.8
## 66 60610 1706.9
2.2 Merge Spatial Data
Next, let’s merge this data to our zip code master spatial file. Reload if necessary:
## Reading layer `geo_export_54bc15d8-5ef5-40e4-8f72-bb0c6dbac9a5' from data source
## `/Users/maryniakolak/code/Intro2RSpatialMed/data/geo_export_54bc15d8-5ef5-40e4-8f72-bb0c6dbac9a5.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 61 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## CRS: 4326
## Simple feature collection with 6 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.80649 ymin: 41.88747 xmax: -87.59852 ymax: 41.93228
## CRS: 4326
## objectid shape_area shape_len zip geometry
## 1 33 106052287 42720.04 60647 MULTIPOLYGON (((-87.67762 4...
## 2 34 127476051 48103.78 60639 MULTIPOLYGON (((-87.72683 4...
## 3 35 45069038 27288.61 60707 MULTIPOLYGON (((-87.785 41....
## 4 36 70853834 42527.99 60622 MULTIPOLYGON (((-87.66707 4...
## 5 37 99039621 47970.14 60651 MULTIPOLYGON (((-87.70656 4...
## 6 38 23506056 34689.35 60611 MULTIPOLYGON (((-87.61401 4...
Next, merge on zip code ID. The key in the Chi_Zips object is zip, whereas the key for the COVID data is ZIP.code. Always merge non-spatial to spatial data, not the other way around. Think of the spatial file as your master file that you will continue to add on to…
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.63999 ymin: 41.85317 xmax: -87.60246 ymax: 41.88913
## CRS: 4326
## zip objectid shape_area shape_len Case.Rate...Cumulative
## 1 60601 27 9166246 19804.58 1451.4
## 2 60602 26 4847125 14448.17 1688.1
## 3 60603 19 4560229 13672.68 1107.3
## 4 60604 48 4294902 12245.81 3964.2
## 5 60605 20 36301276 37973.35 1420.8
## 6 60606 31 6766411 12040.44 2289.6
## geometry
## 1 MULTIPOLYGON (((-87.62271 4...
## 2 MULTIPOLYGON (((-87.60997 4...
## 3 MULTIPOLYGON (((-87.61633 4...
## 4 MULTIPOLYGON (((-87.63376 4...
## 5 MULTIPOLYGON (((-87.62064 4...
## 6 MULTIPOLYGON (((-87.63397 4...
2.3 Quantile Maps
Starting with a “classic epi” approach, let’s look at case rates as quantiles. We use the tmap library, and update the choropleth data classification using the style parameter. We use the Blue-Purple palette, or BuPu, from Colorbrewer.
Colorbrewer Tip: To display all Colorbrewer palette options, load the RColorBrewer library and run display.brewer.all() – or just Google “R Colorbrewer palettes.”
library(tmap)
tm_shape(Chi_Zipsf ) +
tm_polygons("Case.Rate...Cumulative",
style="quantile", pal="BuPu",
title = "COVID Case Rate") 
Let’s try tertiles:
tm_shape(Chi_Zipsf ) +
tm_polygons("Case.Rate...Cumulative",
style="quantile", n=3, pal="BuPu",
title = "COVID Case Rate") 
2.4 Standard Deviation Maps
While quantiles are a nice start, let’s classify using a standard deviation map. Standard deviation is a statistical technique type of map based on how much the data differs from the mean.
tm_shape(Chi_Zipsf ) +
tm_polygons("Case.Rate...Cumulative",
style="sd", pal="BuPu",
title = "COVID Case Rate") 
2.5 Jenks Maps
Another approach of data classification is natural breaks, or jenks. This approach looks for “natural breaks” in the data using a univariate clustering algorithm.
tm_shape(Chi_Zipsf ) +
tm_polygons("Case.Rate...Cumulative",
style="jenks", pal="BuPu",
title = "COVID Case Rate") 
The first bin doesn’t seem very intuitive. Let’s try 4 bins instead of 5 by changing the n parameter. In this version, we’ll also had a histogram and scale bar, and move the legend outside the frame to make it easier to view.
tm_shape(Chi_Zipsf ) +
tm_polygons("Case.Rate...Cumulative",
style="jenks", pal="BuPu",
legend.hist=T, n=4,
title = "COVID Case Rate", ) +
tm_scale_bar(position = "left") +
tm_layout(legend.outside = TRUE, legend.outside.position = "right")
2.6 Integrate More Data
To explore potential disparities in COVID health outcomes, let’s bring in pre-cleaned demographic, racial, and ethnic data from the Opioid Environment Policy Scan database. This data is orginally sourced from the American Community Survey 2018 5-year estimate, which you could also pull using the tidycensus.
## ZCTA year totPopE whiteP blackP amIndP asianP pacIsP otherP hispP
## 1 35004 2018 11762 84.39 13.09 0.00 0.94 0.00 1.57 0.94
## 2 35005 2018 7528 55.22 42.44 0.64 0.00 0.15 1.55 1.37
## 3 35006 2018 2927 96.04 3.21 0.27 0.00 0.00 0.48 0.00
## 4 35007 2018 26328 73.83 13.75 0.04 1.33 0.02 11.01 11.11
## 5 35010 2018 20625 63.07 32.43 0.39 0.65 0.00 3.45 4.10
## 6 35013 2018 40 100.00 0.00 0.00 0.00 0.00 0.00 100.00
## noHSP age0_4 age5_14 age15_19 age20_24 age15_44 age45_49 age50_54
## 1 5.52 787 1950 457 746 4552 662 541
## 2 17.48 511 1055 455 277 2429 580 469
## 3 14.44 161 413 141 203 878 129 193
## 4 12.41 1891 4161 1619 1400 9947 1993 2067
## 5 22.00 1013 2647 1383 1087 7036 1418 1545
## 6 100.00 0 0 0 0 13 8 19
## age55_59 age60_64 ageOv65 ageOv18 age18_64 a15_24P und45P ovr65P
## 1 776 832 1662 8820 7158 10.23 61.97 14.13
## 2 560 552 1372 5691 4319 9.72 53.07 18.23
## 3 316 278 559 2308 1749 11.75 49.61 19.10
## 4 1713 1315 3241 19178 15937 11.47 60.77 12.31
## 5 1510 1341 4115 16142 12027 11.98 51.86 19.95
## 6 0 0 0 40 40 0.00 32.50 0.00
## disbP
## 1 12.7
## 2 23.2
## 3 20.9
## 4 13.5
## 5 19.6
## 6 0.0
Merge to our master Zip Code dataset.
## Simple feature collection with 6 features and 31 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.63999 ymin: 41.85317 xmax: -87.60246 ymax: 41.88913
## CRS: 4326
## zip objectid shape_area shape_len Case.Rate...Cumulative year
## 1 60601 27 9166246 19804.58 1451.4 2018
## 2 60602 26 4847125 14448.17 1688.1 2018
## 3 60603 19 4560229 13672.68 1107.3 2018
## 4 60604 48 4294902 12245.81 3964.2 2018
## 5 60605 20 36301276 37973.35 1420.8 2018
## 6 60606 31 6766411 12040.44 2289.6 2018
## totPopE whiteP blackP amIndP asianP pacIsP otherP hispP noHSP age0_4
## 1 14675 74.17 5.57 0.45 18.00 0.00 1.81 8.68 0.00 550
## 2 1244 68.17 3.78 5.31 19.45 0.00 3.30 6.51 0.00 61
## 3 1174 63.46 3.24 0.00 27.60 0.00 5.71 9.80 0.00 13
## 4 782 63.43 5.63 0.00 29.67 0.00 1.28 4.35 0.00 12
## 5 27519 61.20 17.18 0.18 16.10 0.03 5.31 5.84 2.39 837
## 6 3101 72.75 2.35 0.00 18.09 0.00 6.80 6.29 0.73 57
## age5_14 age15_19 age20_24 age15_44 age45_49 age50_54 age55_59
## 1 156 907 909 8726 976 1009 324
## 2 87 18 91 987 46 53 0
## 3 43 179 172 684 75 47 150
## 4 7 52 168 450 27 47 54
## 5 1279 2172 2282 16364 1766 1520 1824
## 6 44 0 139 1863 213 153 168
## age60_64 ageOv65 ageOv18 age18_64 a15_24P und45P ovr65P disbP
## 1 859 2075 13855 11780 12.37 64.27 14.14 6.4
## 2 5 5 1095 1090 8.76 91.24 0.40 0.2
## 3 50 112 1118 1006 29.90 63.03 9.54 7.3
## 4 92 93 744 651 28.13 59.97 11.89 4.1
## 5 1360 2569 25259 22690 16.19 67.15 9.34 5.3
## 6 172 431 3000 2569 4.48 63.33 13.90 1.9
## geometry
## 1 MULTIPOLYGON (((-87.62271 4...
## 2 MULTIPOLYGON (((-87.60997 4...
## 3 MULTIPOLYGON (((-87.61633 4...
## 4 MULTIPOLYGON (((-87.63376 4...
## 5 MULTIPOLYGON (((-87.62064 4...
## 6 MULTIPOLYGON (((-87.63397 4...
2.7 Thematic Map Panel
To facilitate data discovery, we likely want to explore multiple maps at once. Here we’ll generate maps for multiple variables, and plot them as a map panel.
Can you think of more efficient ways to run this code? There are also other tmap tricks to optimize this further, so enjoy your journey!
COVID <- tm_shape(Chi_Zipsf) + tm_fill("Case.Rate...Cumulative",
style="jenks", pal="BuPu", n=4, title = "COVID Rt") +
tm_layout(frame = F)
Senior <- tm_shape(Chi_Zipsf) + tm_fill("ovr65P",
style="jenks", pal="BuPu", n=4) +
tm_layout(frame = F)
NoHS <- tm_shape(Chi_Zipsf) + tm_fill("noHSP",
style="jenks", pal="BuPu", n=4) +
tm_layout(frame = F)
BlkP <- tm_shape(Chi_Zipsf) + tm_fill("blackP",
style="jenks", pal="BuPu", n=4) +
tm_layout(frame = F)
Latnx <- tm_shape(Chi_Zipsf) + tm_fill("hispP",
style="jenks", pal="BuPu", n=4) +
tm_layout(frame = F)
WhiP <- tm_shape(Chi_Zipsf) + tm_fill("whiteP",
style="jenks", pal="BuPu", n=4) +
tm_layout(frame = F)
tmap_arrange(COVID, Senior, NoHS, BlkP, Latnx, WhiP)
From the results, we see that cumulative COVID outcomes for one week in September 2020 seemed to have some geographic correlation with the Latinx/Hispanic community in Chicago. At the same time, low high school diploma rates are also concentrated in these areas, and there is some intersection with other variables considered. What are additional variables you could bring in to refine your approach? Perhaps percentage of essential workers; a different age group; internet access? What about linking in health outcomes like Asthma, Hypertension, and more at a similar scale?
In modern spatial epidemiology, associations must never be taken at face value. For example, we know that it is not “race” but “racism” that drives multiple health disparities – simply looking at a specific racial/ethnic group is not enough. Thus exploring multiple variables and nurturing a curiosity to understand these complex intersections will support knowledge discovery.
2.8 Write Data
We’re done! Well… not so fast. Let’s save the data so we don’t have to run the codebook again to access the data. Here, we’ll save as a geojson file. This spatial format is more forgiving with long column names, which is a long-standing challenge with shapefiles.
More Resources
For choropleth mapping in R:
3 Adding Resources
In addition to areal data, we can also extract information from individual locations. Locations, when measured as points, can include things like:
- Health providers: Hospitals, Clinics, Pharmacies, Mental health providers, Medication for opioid use disorder providers
- Area resources: Grocery stores & Supermarkets, Playgrounds, Daycare centers, Schools, Community centers
- Area challenges: Crime, Superfund sites, Pollution-emitting facilities
Points can also represent people, like individual patients residing in an area. Because individual locations for persons is protected health information, we’ll focus on point data as resources in the chapter. However, you can reuse the code snippets in this workshop to wrangle patient-level data the same way in a secure environment, under the guidance of your friendly IRB ethics board.
In this example, we’ll extend our Chicago example. We’ll identify areas with high COVID rates, low geographic access to methadone maintenance therapy, and less access to affordable rental housing units managed by the city. We are interested in locating zip codes that may be especially vulnerable to persons with opioid use disorder who use MOUDs. (This is oversimplified, but our example to work with.)
3.1 Geocode
If you start with only addresses, you’ll need to geocode. Our methadone maintenance provider dataset is only available as such. Addresses are comprised of characeters that reference a specific place. We will use the network topology service of a Geocoder to translate that address to a coordinate in some CRS.
First we load the tidygeocoder to get our geocoding done. Note, this uses the interent to process, so is not suitable for HIPPA protected data like individual, living person addresses. For offline geocoders, check out Pelias or ESRI.
Let’s read in and inspect data for methadone maintenance providers. Note, these addresses were made available by SAMSHA, and are known as publicly available information. An additional analysis could call each service to check on access to medication during COVID in Septmber 2020, and the list would be updated further.
## X Name
## 1 1 Chicago Treatment and Counseling Center, Inc.
## 2 2 Sundace Methadone Treatment Center, LLC
## 3 3 Soft Landing Interventions/DBA Symetria Recovery of Lakeview
## 4 4 PDSSC - Chicago, Inc.
## 5 5 Center for Addictive Problems, Inc.
## 6 6 Family Guidance Centers, Inc.
## Address City State Zip
## 1 4453 North Broadway st. Chicago IL 60640
## 2 4545 North Broadway St. Chicago IL 60640
## 3 3934 N. Lincoln Ave. Chicago IL 60613
## 4 2260 N. Elston Ave. Chicago IL 60614
## 5 609 N. Wells St. Chicago IL 60654
## 6 310 W. Chicago Ave. Chicago IL 60654
Let’s geocode one address first, just to make sure our system is working. We’ll use the “cascade” method which use the US Census and OpenStreetMap geocoders. These two services are the main options with tidygeocoder.
sample <- geo("2260 N. Elston Ave. Chicago, IL", lat = latitude, long = longitude, method = 'cascade')
head(sample)## # A tibble: 1 x 4
## address latitude longitude geo_method
## <chr> <dbl> <dbl> <chr>
## 1 2260 N. Elston Ave. Chicago, IL 41.9 -87.7 census
As we prepare for geocoding, check out the structure of the dataset. Do we need to change anything? The data should be a character to be read properly.
## 'data.frame': 27 obs. of 6 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Name : Factor w/ 25 levels "*","A Rincon Family Services",..: 5 25 23 21 3 8 2 1 14 24 ...
## $ Address: Factor w/ 27 levels "110 E. 79th St.",..: 20 21 17 6 23 10 16 3 5 8 ...
## $ City : Factor w/ 1 level "Chicago": 1 1 1 1 1 1 1 1 1 1 ...
## $ State : Factor w/ 1 level "IL": 1 1 1 1 1 1 1 1 1 1 ...
## $ Zip : int 60640 60640 60613 60614 60654 60654 60651 60607 60607 60616 ...
We need to clean the data a bit. We’ll add a new column for a full address, as required by the geocoding service. When you use a geocoding service, be sure to read the documentation and understand how the data needs to be formatted for input.
methadoneClinics$fullAdd <- paste(as.character(methadoneClinics$Address),
as.character(methadoneClinics$City),
as.character(methadoneClinics$State),
as.character(methadoneClinics$Zip))We’re ready to go! Batch geocode with one function, and inspect:
geoCodedClinics <- geocode(methadoneClinics,
address = 'fullAdd', lat = latitude, long = longitude, method = 'cascade')
head(geoCodedClinics)## # A tibble: 6 x 10
## X Name Address City State Zip fullAdd latitude longitude
## <int> <fct> <fct> <fct> <fct> <int> <chr> <dbl> <dbl>
## 1 1 Chic… 4453 N… Chic… IL 60640 4453 N… NA NA
## 2 2 Sund… 4545 N… Chic… IL 60640 4545 N… NA NA
## 3 3 Soft… 3934 N… Chic… IL 60613 3934 N… 42.0 -87.7
## 4 4 PDSS… 2260 N… Chic… IL 60614 2260 N… 41.9 -87.7
## 5 5 Cent… 609 N.… Chic… IL 60654 609 N.… 41.9 -87.6
## 6 6 Fami… 310 W.… Chic… IL 60654 310 W.… 41.9 -87.6
## # … with 1 more variable: geo_method <chr>
There were two that didn’t geocode correctly. You can inspect further. This could involve a quick check for spelling issues; or, searching the address and pulling the lat/long using Google Maps and inputting manually. Or, if we are concerned it’s a human or unknown error, we could omit. For this exercise we’ll just omit the two clinics that didn’t geocode correctly.
3.2 Convert to Spatial Data
This is not spatial data yet! To convert a static file to spatial data, we use the powerful st_as_sf function from sf. Indicate the x,y parameters (=longitude, latitude) and the coordinate reference system used. Our geocoding service used the standard EPSG:4326, so we input that here.
3.3 Basic Map of Points
For a really simple map of points – to ensure they were geocoded and converted to spatial data correctly, we use tmap. We’ll use the interactive version to view.
If your points didn’t plot correctly:
- Did you flip the longitude/latitude values?
- Did you input the correct CRS?
Those two issues are the most common errors.
3.4 Overlay Points & Style
Let’s add our zip code map from the previous module. First load the data, then overlay.
## Reading layer `ChiZipMaster1' from data source
## `/Users/maryniakolak/code/Intro2RSpatialMed/data/ChiZipMaster1.geojson'
## using driver `GeoJSON'
## Simple feature collection with 540 features and 31 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.87596 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## CRS: 4326
With this overlay, we’ll add a “hack” to include the methadone clinic points in a legend.
## tmap mode set to plotting
## 1st layer (gets plotted first)
tm_shape(Chi_Zipsf) + tm_fill("Case.Rate...Cumulative",
style="jenks", pal="BuPu", n=4, title = "COVID Rt") +
## 2nd layer (overlay)
tm_shape(methadoneSf) + tm_dots(size = 0.2, col = "gray20") +
## "Hack" a manual symbology for dots in the legend
tm_add_legend("symbol", col = "gray20", size = .2, labels = "Methadone MOUD") +
## Cartographic Styling
tm_layout(legend.outside = TRUE, legend.outside.position = "right")
3.5 Integrate More Data
From here, we can integrate more data. Let’s try a different point dataset – Affordable Rental Housing Developments, as made available by the City of Chicago Data Portal. This could be interesting for a number of different reasons – maybe we hypothesize better outcomes are associated with better access to affordable housing options? Or, we hypothesize the opposite, that mean distance to more population dense housing locations is vulnerable to airborne disease?
For this example, we’ll think about this dataset as access to secure and affordable housing. Persons with lower incomes residing in places with fewer developments may be more vulnerable to housing insecurity -> impacts health.
## Community.Area.Name Community.Area.Number Property.Type
## 1 Englewood 68 Veterans
## 2 Rogers Park 1 Senior
## 3 Uptown 3 ARO
## 4 Edgewater 77 Senior
## 5 Roseland 49 Supportive Housing
## 6 Humboldt Park 23 Multifamily
## Property.Name Address Zip.Code
## 1 Hope Manor Village 5900-6100 S. Green/Peoria/Sangamon 60621
## 2 Morse Senior Apts. 6928 N. Wayne Ave. 60626
## 3 The Draper 5050 N. Broadway 60640
## 4 Pomeroy Apts. 5650 N. Kenmore Ave. 60660
## 5 Wentworth Commons 11045 S. Wentworth Ave. 60628
## 6 Nelson Mandela Apts. 607 N. Sawyer Ave. 60624
## Phone.Number Management.Company Units X.Coordinate
## 1 312-564-2393 Volunteers of America Illinois 36 NA
## 2 312-602-6207 Morse Urban Dev. 44 1165844
## 3 312-818-1722 Flats LLC 35 1167357
## 4 773-275-7820 Habitat Company 198 1168181
## 5 773-568-7804 Mercy Housing Lakefront 50 1176951
## 6 773-227-6332 Bickerdike Apts. 6 1154640
## Y.Coordinate Latitude Longitude
## 1 NA NA NA
## 2 1946059 42.00757 -87.66517
## 3 1933882 41.97413 -87.65996
## 4 1937918 41.98519 -87.65681
## 5 1831516 41.69302 -87.62777
## 6 1903912 41.89215 -87.70753
## Location
## 1
## 2 (42.0075737709331, -87.6651711448293)
## 3 (41.9741295261027, -87.6599553011627)
## 4 (41.9851867755403, -87.656808676983)
## 5 (41.6930159120977, -87.6277673462214)
## 6 (41.8921534052465, -87.7075265659001)
There were a few data points with odd inputs and null values. Remember, we can’t convert any null values to spatial coordinates. Again, in an ideal context, you would explore and understand what is happening, systematically. In our experiment, we’ll omit nulls.
Look at the structure of the object.
## 'data.frame': 487 obs. of 14 variables:
## $ Community.Area.Name : Factor w/ 65 levels "Albany Park",..: 49 55 17 50 25 22 65 45 45 37 ...
## $ Community.Area.Number: int 1 3 77 49 23 38 42 36 36 8 ...
## $ Property.Type : Factor w/ 26 levels "65+/Supportive",..: 13 2 13 19 8 8 8 8 13 19 ...
## $ Property.Name : Factor w/ 394 levels "1038 N. Ashland",..: 214 338 257 376 218 186 109 235 234 348 ...
## $ Address : Factor w/ 478 levels "10 N. Hamlin Ave.",..: 424 348 367 17 384 298 71 279 275 64 ...
## $ Zip.Code : int 60626 60640 60660 60628 60624 60653 60637 60653 60653 60622 ...
## $ Phone.Number : Factor w/ 326 levels "217-779-5697",..: 57 78 134 221 111 232 225 164 164 32 ...
## $ Management.Company : Factor w/ 200 levels "@properties",..: 112 57 66 104 17 80 176 167 169 74 ...
## $ Units : int 44 35 198 50 6 71 67 534 148 40 ...
## $ X.Coordinate : num 1165844 1167357 1168181 1176951 1154640 ...
## $ Y.Coordinate : num 1946059 1933882 1937918 1831516 1903912 ...
## $ Latitude : num 42 42 42 41.7 41.9 ...
## $ Longitude : num -87.7 -87.7 -87.7 -87.6 -87.7 ...
## $ Location : Factor w/ 477 levels "","(41.648457411436, -87.5401231660406)",..: 475 455 462 8 289 122 67 147 151 328 ...
## - attr(*, "na.action")= 'omit' Named int 1
## ..- attr(*, "names")= chr "1"
In this dataset, we can see coordinate information is already included – twice! You’re looking at 2 different types of coordinate systems. We’ll use “Longitude” and “Latitude” to represent X,Y and an ESPG of 4326. We’re guessing, and hopeful.
We can now map the data for a quick view – does this look like Chicago, hopefully?

3.6 Graduated Symbology
Previously we mapped points as dots. We literally used the tm_dots() function to do so. Another option is changing the size of the point, according to some attribute of the data. In this dataset, we see an attribute field that gives us the total number of units per housing site. Let’s use a graduated symbology, with the tm_bubbles() function, to map these points. That way points with more units will be bigger, and not all places are weighted the same visually.
tm_shape(Chi_Zipsf) + tm_polygons(col = "gray80") +
tm_shape(AffHousingSf) + tm_bubbles("Units", col = "purple") 
3.7 Style Final Map
Let’s pull what we learned in the last tutorial, and map everything at once. Which zip codes are the most vulnerable to persons with OUD during the pandemic in September 2020, based on the information we have here?
## Zip Codes with Labels
tm_shape(Chi_Zipsf) + tm_fill("Case.Rate...Cumulative",
style="jenks", pal="BuPu", n=4, title = "COVID Rt") +
tm_text("zip", size = 0.7) +
## Affordable Housing Units
tm_shape(AffHousingSf) + tm_bubbles("Units") +
## Methadone MOUD
tm_shape(methadoneSf) + tm_dots(size = 0.2, col = "gray20") +
## Cartographic Styling
tm_add_legend("symbol", col = "gray20", size = .2, labels = "Methadone MOUD") + tm_layout(legend.outside = TRUE, legend.outside.position = "right")
In RStudio, you could zoom into the plot you created to get a better view. Save as an image, or save as a webpage!
Save any data you need from this session.
4 Calculate Spatial Metrics
While we’ve generated some nice visualizations, we need insights quantified as metrics at the neighborhood level. Don’t start this step until you have a good idea of what you need. Visualizing and exploring the data in depth is best practice.
For our purposes, we’re interested in developing spatial access metrics with a container method approach. At the end of this tutorial, we’ll generate the following new variables:
- Total number of Methadone Maintenance MOUD by zip code
- Total number of Walkble MOUD Service Areas by zip code
Plus, we will have a new spatial layer, that includes the actual service areas (ie. 1-mile buffers of MOUDs). We assume that access to MOUDs is critical and requires high regularity, and that walking is the most likely option during COVID. This guides the parameter specification of MOUD Service Areas (and is also backed up by some literature in this space, though much more is needed.)
4.1 Load Spatial Data
Let’s first reload our spatial data – this will be the MOUD points, plus the master zip code spatial file.
## Reading layer `methadoneMOUD' from data source
## `/Users/maryniakolak/code/Intro2RSpatialMed/data/methadoneMOUD.geojson'
## using driver `GeoJSON'
## Simple feature collection with 25 features and 8 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -87.7349 ymin: 41.68698 xmax: -87.57673 ymax: 41.9533
## CRS: 4326
## Reading layer `geo_export_54bc15d8-5ef5-40e4-8f72-bb0c6dbac9a5' from data source
## `/Users/maryniakolak/code/Intro2RSpatialMed/data/geo_export_54bc15d8-5ef5-40e4-8f72-bb0c6dbac9a5.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 61 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## CRS: 4326
## Simple feature collection with 6 features and 8 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -87.72186 ymin: 41.88474 xmax: -87.63409 ymax: 41.9533
## CRS: 4326
## X Name
## 1 3 Soft Landing Interventions/DBA Symetria Recovery of Lakeview
## 2 4 PDSSC - Chicago, Inc.
## 3 5 Center for Addictive Problems, Inc.
## 4 6 Family Guidance Centers, Inc.
## 5 7 A Rincon Family Services
## 6 8 *
## Address City State Zip
## 1 3934 N. Lincoln Ave. Chicago IL 60613
## 2 2260 N. Elston Ave. Chicago IL 60614
## 3 609 N. Wells St. Chicago IL 60654
## 4 310 W. Chicago Ave. Chicago IL 60654
## 5 3809 W. Grand Ave. Chicago IL 60651
## 6 140 N. Ashland Ave. Chicago IL 60607
## fullAdd geo_method
## 1 3934 N. Lincoln Ave. Chicago IL 60613 census
## 2 2260 N. Elston Ave. Chicago IL 60614 census
## 3 609 N. Wells St. Chicago IL 60654 census
## 4 310 W. Chicago Ave. Chicago IL 60654 census
## 5 3809 W. Grand Ave. Chicago IL 60651 census
## 6 140 N. Ashland Ave. Chicago IL 60607 osm
## geometry
## 1 POINT (-87.67818 41.9533)
## 2 POINT (-87.67407 41.92269)
## 3 POINT (-87.63409 41.89268)
## 4 POINT (-87.63636 41.89657)
## 5 POINT (-87.72186 41.90435)
## 6 POINT (-87.66725 41.88474)
## Simple feature collection with 6 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -87.80649 ymin: 41.88747 xmax: -87.59852 ymax: 41.93228
## CRS: 4326
## objectid shape_area shape_len zip geometry
## 1 33 106052287 42720.04 60647 MULTIPOLYGON (((-87.67762 4...
## 2 34 127476051 48103.78 60639 MULTIPOLYGON (((-87.72683 4...
## 3 35 45069038 27288.61 60707 MULTIPOLYGON (((-87.785 41....
## 4 36 70853834 42527.99 60622 MULTIPOLYGON (((-87.66707 4...
## 5 37 99039621 47970.14 60651 MULTIPOLYGON (((-87.70656 4...
## 6 38 23506056 34689.35 60611 MULTIPOLYGON (((-87.61401 4...
4.2 Transform Projections
First we need to switch to a projection that uses distance in feet or meters as a metric. We’ll use EPSG:3435 from the first tutorial. To find which EPSG was recommended, I searched “EPSG Illinois feet” and EPSG:3435 came up as a viable candidate. So, we use that for our new, projected CRS.
We may want to once again confirm they are plotting correctly:

4.3 Count resources by area
One way of understanding resource inequity is by thinking about how many resources exist in a neighborhood.
First, give the points the attributes of the polygons they are in. Inspect.
## Simple feature collection with 6 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1150707 ymin: 1901294 xmax: 1174632 ymax: 1926255
## CRS: EPSG:3435
## X Name
## 1 3 Soft Landing Interventions/DBA Symetria Recovery of Lakeview
## 2 4 PDSSC - Chicago, Inc.
## 3 5 Center for Addictive Problems, Inc.
## 4 6 Family Guidance Centers, Inc.
## 5 7 A Rincon Family Services
## 6 8 *
## Address City State Zip
## 1 3934 N. Lincoln Ave. Chicago IL 60613
## 2 2260 N. Elston Ave. Chicago IL 60614
## 3 609 N. Wells St. Chicago IL 60654
## 4 310 W. Chicago Ave. Chicago IL 60654
## 5 3809 W. Grand Ave. Chicago IL 60651
## 6 140 N. Ashland Ave. Chicago IL 60607
## fullAdd geo_method objectid shape_area
## 1 3934 N. Lincoln Ave. Chicago IL 60613 census 53 53990895
## 2 2260 N. Elston Ave. Chicago IL 60614 census 32 94460632
## 3 609 N. Wells St. Chicago IL 60654 census 55 15869961
## 4 310 W. Chicago Ave. Chicago IL 60654 census 54 31598157
## 5 3809 W. Grand Ave. Chicago IL 60651 census 37 99039621
## 6 140 N. Ashland Ave. Chicago IL 60607 osm 16 106718949
## shape_len zip geometry
## 1 31196.32 60613 POINT (1162460 1926255)
## 2 50587.35 60614 POINT (1163663 1915110)
## 3 17119.70 60654 POINT (1174632 1904257)
## 4 24208.70 60610 POINT (1174003 1905671)
## 5 47970.14 60651 POINT (1150707 1908328)
## 6 42663.20 60612 POINT (1165627 1901294)
You should have the same number of rows in pipr as you do in points. If not, there is something off. You may need to go back to troubleshoot. In an earlier version of this lab, I used a saved, written geojson file of the zip codes, and it would not render properly. Here, we load in the original shapefile at the beginning of the tutorial to avoid that error.
## [1] 25 13
## [1] 25 9
## [1] 61 5
4.3.1 Count # per Area
Next, count the number per area. The frequency should be logical according to the map you made earlier. Sometimes, I’ve found bugs where the numbers are multipled by some factor; this can be checked by looking at dimension disparities, as noted above.
## Var1 Freq
## 1 60607 2
## 2 60608 1
## 3 60609 1
## 4 60613 1
## 5 60614 1
## 6 60615 1
How could improve on this step if you used dplyr?
Aggregation Tip: What if you have an attribute value you’d like to aggregate? For example, average units of affordable housing facility by zip?
Try aggregate(pip$attribute, by = list(pip$geoid), mean) but build on with a tidy sensibility…
Now we can rename our attributes:
## zip MetClnc
## 1 60607 2
## 2 60608 1
## 3 60609 1
## 4 60613 1
## 5 60614 1
## 6 60615 1
And finally, merge back to your master zip file:
## Simple feature collection with 6 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1127607 ymin: 1902374 xmax: 1184320 ymax: 1918596
## CRS: EPSG:3435
## objectid shape_area shape_len zip geometry
## 1 33 106052287 42720.04 60647 MULTIPOLYGON (((1162711 191...
## 2 34 127476051 48103.78 60639 MULTIPOLYGON (((1149304 191...
## 3 35 45069038 27288.61 60707 MULTIPOLYGON (((1133505 190...
## 4 36 70853834 42527.99 60622 MULTIPOLYGON (((1165664 190...
## 5 37 99039621 47970.14 60651 MULTIPOLYGON (((1154895 190...
## 6 38 23506056 34689.35 60611 MULTIPOLYGON (((1180097 190...
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1173038 ymin: 1889918 xmax: 1183259 ymax: 1902959
## CRS: EPSG:3435
## zip objectid shape_area shape_len MetClnc
## 1 60601 27 9166246 19804.58 NA
## 2 60602 26 4847125 14448.17 NA
## 3 60603 19 4560229 13672.68 NA
## 4 60604 48 4294902 12245.81 NA
## 5 60605 20 36301276 37973.35 NA
## 6 60606 31 6766411 12040.44 NA
## geometry
## 1 MULTIPOLYGON (((1177742 190...
## 2 MULTIPOLYGON (((1181226 190...
## 3 MULTIPOLYGON (((1179499 190...
## 4 MULTIPOLYGON (((1174763 189...
## 5 MULTIPOLYGON (((1178341 189...
## 6 MULTIPOLYGON (((1174681 190...
Quickly map to inspect:
tm_shape(areas) + tm_polygons(col = "gray80") +
tm_shape(areas) + tm_polygons(col = "MetClnc", style = "pretty", alpha = 0.8) +
tm_shape(points) + tm_dots(size = 0.5) 
4.4 Buffer Data
Next, lets create a walkable buffer of one mile, or 5280 feet, for our MOUD provider locations. Individuals residing in places outside of that walkabile area may have difficulty accessing this medication during crises, like a pandemic.
Inspect immediately:
tm_shape(areas) + tm_borders() +
tm_shape(ptbuffers) + tm_borders(col = "blue") +
tm_shape(points) + tm_dots(col = "red") 
4.5 Count buffers by area
We know that MOUD locations are accessible up to one mile away. So, a total count of resources by area may be too restrictive. Let’s calculate how many walkable MOUD clinics are in each zip code. Or, how many buffers are in each area…
## [1] 2 2 1 1 1 2
Stick buffer totals back to the zip master file:
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1173038 ymin: 1889918 xmax: 1183259 ymax: 1902959
## CRS: EPSG:3435
## zip objectid shape_area shape_len MetClnc bufferct
## 1 60601 27 9166246 19804.58 NA 2
## 2 60602 26 4847125 14448.17 NA 2
## 3 60603 19 4560229 13672.68 NA 1
## 4 60604 48 4294902 12245.81 NA 1
## 5 60605 20 36301276 37973.35 NA 1
## 6 60606 31 6766411 12040.44 NA 2
## geometry
## 1 MULTIPOLYGON (((1177742 190...
## 2 MULTIPOLYGON (((1181226 190...
## 3 MULTIPOLYGON (((1179499 190...
## 4 MULTIPOLYGON (((1174763 189...
## 5 MULTIPOLYGON (((1178341 189...
## 6 MULTIPOLYGON (((1174681 190...
Map density of buffers per census area:
tm_shape(areas) + tm_polygons(col = "bufferct", palette = "BuGn", n=5, style = "jenks") +
tm_shape(ptbuffers) + tm_fill(col = "gray90", alpha=0.2) +
tm_shape(points) + tm_dots(col = "gray10", alpha = 0.8, size = 0.3) 
4.6 Integrate & Explore
Let’s review: our master area file now has total number resources by zip and total number of walkable service areas by zip.
Using your new spatial file, see if you can answer some of these quetions using various queries:
Which zip codes have high rates of COVID and are not within a walking distance of a methadone MOUD?
Which zip codes have worse access to affordable rental units, low educational rates, and less walkable access to MOUDs?
What is the demographic and racial/ethnic characteristics of areas most vulnerable to high COVID rates in September 2020?
Generate different maps and outputs to drive your thinking and defend your hypothesis formation.